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Abstract 

Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to 
yield the desired probabilistic guarantees are often too complicated to be practical. Here we survey recent 
results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. 

Simple tabulation hashing dates back to Zobrist [1970], Keys are viewed as consisting of c characters 
and we have precomputed character tables h\,...,h c mapping characters to random hash values. A key 

x = [x \,..., x c ) is hashed to hi [xf © .®ft c [a: c ]. This schemes is very fast with character tables 

in cache. While simple tabulation is not even 4-independent, it does provide many of the guarantees that 
are normally obtained via higher independence, e.g., linear probing and Cuckoo hashing. 

Next we consider twisted tabulation where one input character is ’’twisted” in a simple way. The 
resulting hash function has powerful distributional properties: Chernoff-style tail bounds and a very 
small bias for min-wise hashing. This also yields an extremely fast pseudo-random number generator 
that is provably good for many classic randomized algorithms and data-structures. 

Finally, we consider double tabulation where we compose two simple tabulation functions, applying 
one to the output of the other, and show that this yields very high independence in the classic framework 
of Carter and Wegman [1977]. In fact, w.h.p., for a given set of size proportional to that of the space con¬ 
sumed, double tabulation gives fully-random hashing. We also mention some more elaborate tabulation 
schemes getting near-optimal independence for given time and space. 

While these tabulation schemes are all easy to implement and use, their analysis is not. 


1 Introduction 

A useful assumption in the design of randomized algorithms and data structures is the free availability of 
fully random hash functions which can be computed in unit time. Removing this unrealistic assumption 
is the subject of a large body of work. To implement a hash-based algorithm, a concrete hash function 
has to be chosen. The space, time, and random choices made by this hash function affects the overall 
performance. The generic goal is therefore to provide efficient constructions of hash functions that for 
important randomized algorithms yield probabilistic guarantees similar to those obtained assuming ffilly 
random hashing. 

To fully appreciate the significance of this program, we note that many randomized algorithms are very 
simple and popular in practice, but often they are implemented with too simple hash functions without the 
necessary guarantees. This may work very well in random tests, adding to then - popularity, but the real world 
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is full of structured data, e.g., generated by computers, that could be bad for the hash function. This was 
illustrated in [51] showing how simple common inputs made linear probing fail with popular hash functions, 
explaining its perceived unreliability in practice. The problems disappeared when sufficiently strong hash 
functions were used. 

In this paper we will survey recent results from [12, 13, 14, 15, 41, 42, 50] showing how simple realistic 
hashing schemes based on tabulation provide unexpectedly strong guarantees for many popular randomized 
algorithms, e.g., 1 inear probing. Cuckoo hashing, min-wise independence, treaps, planar partitions, power- 
of-two-choices, Chernoff-style concentration bounds, and even high independence. The survey is from a 
users perspective, explaining how these tabulation schemes can be applied. While these schemes arc all 
very simple to describe and use, the analysis showing that they work is non-trivial. For the analysis, the 
reader will be referred to the above papers. The reader is also referred to these papers for a historical 
account of previous work. 

Background Generally a hash function maps a key universe U of keys into some range 72 of hash values. 
A random hash function h is a random variable from R u , assigning a random hash value h(x) G 72 to every 
x G hi. A truly random hash function is picked uniformly from 'R li , assigning a uniform and independent 
hash value h(x) G 72 to each key x G U. Often randomized algorithms arc analyzed assuming access to 
truly random hash functions. However, just storing a truly random hash function requires \U\ log 2 |72| bits, 
which is unrealistic for large key universes. 

In general, the keys may originate come from a very large universe U. However, often we arc only 
interested in the performance on an unknown set S C U of up to n keys. Then our first step is to do a 
universe reduction , mapping U randomly to “signatures” in [it] = {0,1,..., u — 1}, where u = n olyl \ e.g., 
u = n 3 , so that no two keys from S arc expected to get the same signature [9]. Below we generally assume 
that this universe reduction has been done, if needed, hence that we “only” need to deal with keys from the 
polynomial universe [u]. 

The concept of /r-indcpendcncc was introduced by Wegman and Carter [52] in FOCS’79 and has been 
the cornerstone of our understanding of hash functions ever since. As above, we think of a hash function 
h : [u] [m] as a random variable distributed over [m] [“l. We say that h is /r-indcpcndcnt if (a) for any 

distinct keys x\,... ,Xk G [it], the hash values h(x i),..., h(xk) arc independent random variables; and (b) 
for any fixed x, hix) is uniformly distributed in [m]. 

As the concept of independence is fundamental to probabilistic analysis, ^-independent hash functions 
arc both natural and powerful in algorithm analysis. They allow us to replace the heuristic assumption 
of truly random hash functions that arc uniformly distributed in [m]^, hence needing u\gm random bits 
(lg = log 2 ), with real implementable hash functions that arc still “independent enough” to yield provable 
performance guarantees similar to those proved with true randomness. We arc then left with the natural goal 
of understanding the independence required by hashing-based algorithms. 

Once we have proved that ^-independence suffices for a hashing-based randomized algorithm, we arc 
free to use any fc-independent hash function. The canonical construction of a fc-independent hash function is 
based on polynomials of degree k — 1. Let p > u be prime. Picking random ,..., ak-i G {0,..., p — 1}, 
the hash function is defined by: 

h(x) = ^(afc_ ix k_1 + • • • + a\x + ao) modpj (1) 

If we want to limit the range of hash values to [m], we use h(x) mod rn. This preserves requirement (a) 
of independence among k hash values. Requirement (b) of uniformity is close to satisfied if p » rn. As 
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suggested in [9], for a faster implementation, we can let p be a Mersenne prime, e.g., to hash 64-bit integers, 
we could pick p = 2 81 — 1. 

Sometimes 2-independence suffices. For example, 2-independence implies so-called universality [9]; 
namely that the probability of two keys x and y colliding with h(x) = h(y) is 1/m; or close to 1/m if 
the uniformity of (b) is only approximate. Universality implies expected constant time performance of hash 
tables implemented with chaining. Universality also suffices for the 2-level hashing of Fredman et al. [24], 
yielding static hash tables with constant query time. Moreover, Mitzenmacher and Vadhan [34] have proved 
that 2-independent hashing in many applications works almost like truly random hashing if the input has 
enough entropy. However, structured, low-entropy data, are very common in the real world. 

We do have very fast implementations of universal and 2-independent hashing [16, 17], but unfortu¬ 
nately, these methods do not generalize nicely to higher independence. 

At the other end of the spectrum, when dealing with problems involving n objects, 0(lg n )-independence 
suffices in a vast majority of applications. One reason for this is the Chernoff bounds of [43] for k- 
independent events, whose probability bounds differ from the full-independence Chernoff bound by T . 
Another reason is that random graphs with 0(lg n )-independent edges [3] share many of the properties of 
truly random graphs. 

When it comes to high independence, we note that the polynomial method from Equation (1) takes O(k) 
time and space for ^-independence. This is no coincidence. Siegel [46] has proved that to implement k- 
independence with less than k memory accesses, we need a representation using a 1 ! k space. He also gives a 
solution that for any cuses 0(u l ^ c ) space, c () - c> evaluation time, and achieves independence (which 

is superlogarithmic, at least asymptotically, for c = 0(1)). The construction is non-uniform, assuming a 
certain small expander which gets used in a graph product. Siegel [46] states about his scheme that it is “far 
too slow for any practical application”. 

The independence measure has long been central to the study of randomized algorithms. For example, 
[29] considers valiants of Quicksort, [1] consider the maximal bucket size for hashing with chaining, and 
[28, 19] consider Cuckoo hashing. In several cases [1, 19, 29], it is proved that linear transformations 
x i->- ((ax + b) mod p) do not suffice for good performance, hence that 2-independence is not in itself 
sufficient. 

This paper surveys a family of “tabulation” based hash function that like Siegel’s hash function use 
()(vM c ) space for some constant c, but which arc simple and practical, and offer strong probabilistic guar¬ 
antees for many popular randomized algorithms despite having low independence. We start with the sim¬ 
plest and fastest tabulation scheme, and move later to more complicated schemes with stronger probabilistic 
guarantees. 

We note that there has been several previous works using tabulation to construct efficient hash func¬ 
tions (see, e.g., [18, 20]), but the schemes discussed here arc simpler and more efficient. More detailed 
comparisons with previous works arc found in the papers surveyed. 

2 Simple tabulation 

The first scheme we consider is simple tabulation hashing where the hash values arc r-bit numbers. Our goal 
is to hash keys from U = [it] into the range R, = [2 r \. In tabulation hashing, a key x <G [u] is interpreted as a 
vector of c > 1 “ characters ” from the alphabet £ = [a 1 /'’], i.e., x = (ap,..., x c _i) € £ c . As a slight abuse 
of notation, we shall sometimes use £ instead of |£| to denote the size of the alphabet when the context 
makes this meaning clear - . This matches the classic recursive set-theoretic definition of a natural as the set 
of smaller naturals. 
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For “simple tabulation hashing” we initialize independent random character tables ho,, h c -\ : E —>• 
71. The hash h(x) of a key x = (xo, ■■■, x c -t) is computed as: 

h(x) = Q)hi[xi\. (2) 

ie[c] 

Here © denotes bit-wise exclusive-or. This is a well-known scheme dating back at least to Zobrist [53]. 
For him a character position corresponds to a position on a game board, and the character is the piece at 
the position. If the piece at a position i changes from x r to x\, he updates the overall hash value h to 

h! = h® hi[xi ] © hi\x \]. 

It is easy to see that simple tabulation is 3-independent, for if we have a set X of two or three keys, then 
there must be a position i € [c] where one key x has a character x t not shared with any other key in X. This 
means that x is the only key in X whose hash value depends on h- t [xj], so the hash value of x is independent 
of the other hash values from X. On the other hand, simple tabulation is not 4-independent. Given 4 keys 
(ao, bo), (ai, bo), (ao, b\), (ai, b\), no matter how we fill our tables, we have 


h(a 0 , b 0 ) © h(ai,b 0 ) © h{a 0 , h) © h{a\,b\) = 0. 

Thus, given the hash values of any of three of the keys, we can uniquely determine the fourth hash value. 

In our context, we assume that the number c = 0(1) of character positions is constant, and that character 
tables fit in fast cache. Justifying this assumption, recall that if we have n keys from a very large universe, 
we can first do a universe reduction, applying a universal hashing function [9] into an intermediate universe 
of size u = n ()t l K expecting no collisions (sometimes, as in [48], we can even accept some collisions from 
the first universal hashing, and hence use an intermediate universe size closer to n). Now, for any desired 
small constant £ > 0, we get down to space 0(n £ ) picking c = 0(1) so that E = u l ^ c < n £ . We shall refer 
to the lookups in the hi as “character lookups”, emphasizing that they arc expected to be much faster than a 
general lookup in a table of size n. 

Putting things into a practical perspective (this paper claims practical schemes with strong theoretical 
guarantees), in the experiments from [41, 49], for 32-bit keys, simple tabulation with 4 character lookups 
took less than 5ns whereas a single memory lookup in a 4MB table took more than 120ns. Character lookups 
were thus about 100 times faster than general lookups! We note that the character lookups parallelize easily 
and this may or may not have been exploited in the execution (we just wrote portable C-code, leaving the rest 
to compiler and computer). Simple tabulation was only 60% slower than the 2-independent multiply-shift 
scheme from [16] which for 32-bit keys is dominated by a single 64-bit multiplication. However, simple 
tabulation is 3-independent and in experiments from [41], it was found to be more than three times faster 
than 3-independent hashing implemented by a degree-2 polynomial tuned over the Mersenne prime field 
Z 2 6i_ 1 . The high speed of simple tabulation conforms with experiments on much older architectures [47]. 
Because cache is so critical to computation, most computers arc configured with a very fast cache, and this 
is unlikely to change. 

Usually it is not a problem to fill the character tables ho, ■ ■ ■, h c ~ t with random numbers, e.g., down¬ 
loading them from http: / / random. org which is based on atmospheric noise. However, for the theory 
presented here, it would suffice to fill them with a strong enough pseudo-random number generator (PRG), 
like a (lgtt)-independent hash function, e.g., using the new fast generation from [11], The character tables 
just need to point to an area in memory with random bits, and this could be shared across many applica¬ 
tions. One could even imagine computers configured with random bits in some very fast read-only memory 
allowing parallel access from multiple cores. 
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In [41] simple tabulation hashing was proved to have much more power than suggested by its 3- 
independence. This included fourth moment bounds, min-wise hashing, random graph properties necessary 
in cuckoo hashing [39], and Chernoff bounds for distributing balls into many bins. The details follow below. 

Concentration bounds First we consider using simple tabulation hashing to distribute n balls into rn = 2 r 
bins, that is, assuming that the balls have keys from [u], we arc using a simple tabulation hash function 
h : [u] —>• [m\. In a hash table with chaining, the balls in a bin would be stored in a linked list. 

Consider the number X of balls landing in a given bin. We have /t = P\X = n/m. Patrascu and 
Thorup [41] have proved that, w.h.p., we get a Chernoff-style concentration on X. First recall the classic 
Chernoff bounds [36, §4] for full randomness. On the upper bound side, we have [36, Theorem 4.1] 

Pr[X > (1 + 5)p] < ^ + ^ (1+g) ) [ < exp(-4 2 p/3) for 6 < l] (3) 

The corresponding probabilistic lower bound [36, Proof of Theorem 4.2] for 5 < 1 is 

Pt[X < (1 - 6)p] < ^ [ < exp(-4 2 p/2) for 6 < l] (4) 

We note that in connection with hash tables, we arc often not just interested in a given bin, but rather we 

care about the bin that a specific query ball lands in. This is why the hash of the query ball is involved in the 
theorem below with Chernoff-style bounds for simple tabulation. 

Theorem 1 ([41]) Consider hashing n balls into m > n l 1 /( 2c ) bins by simple tabulation (recall that 
c = 0(1) is the number of characters). Define X as the number of regular balls that hash into a given bin 
or a bin chosen as a function of the bin h(q ) of an additional query ball q. The following probability bounds 
hold for any constant 7 : 



/ e <5 N 

a(p) 


Pr[X > (1 + 6)fi\ < | 

1(1 + (5) (1+5 ), 

j + 1 / m 7 

(5) 


/ e -s ' 

n(n) 


Pv[X < (1 - 6)p] < | 


j +l/?n 7 . 

( 6 ) 


With m < n bins (including m < n 1-1 /( 2c )), every bin gets 

n/mXO ^\fnfm log c n^j . (7) 

balls with probability 1 — n _7 . 

Contrasting the standard Chernoff bounds, we see that Theorem 1 (5) and ( 6 ) can only provide poly¬ 
nomial^ small probability, i.e. at least m -7 for any desired constant 7 . This corresponds to if we had 
©(log m)-independence in the Chernoff bound from [43]. In addition, the exponential dependence on p is 
reduced by a constant which depends (exponentially) on the constants 7 and c. 

The upper bound (5) implies that any given bin has 0(lg n/ lg lg n) balls w.h.p., but then this holds for 
all m bins w.h.p. Simple tabulation is the simplest and fastest constant time hash function to achieve this 
fundamental property. 

Complementing the above Chernoff-style bound, Dahlgaard et al. [13] have proved that we also get the 
k th moment bounds normally associated with fc-independence. 
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Theorem 2 ([13]) With the same setup as in Theorem 1, for any constant k = 0(1), 


E 


(X - „)‘ 


0(M + K t/2 ). 


For k < 4, this bound was proved in [41], and disallowing the dependence on a query ball, it was also proved 
in [6], again for k < 4. Compelling applications of 4 th moment bounds are given by [2], [29], and [38]. In 
[29], it was shown that any hash function with a good 4 th moment bound suffices for a non-recursive version 
of Quicksort, routing on the hypercube, etc. In [2], the 4 th moment is used to estimate the 2 nd moment 
of a data stream. In [38], limited 4 th moment is shown to imply constant expected performance for linear 
probing. The applications in [29, 38] both require dependence of a query bin. 

Since /;: lh moment bounds is one of the main ways /c-independence is used, it is nice that they are 
achieved by simple tabulation which is only 3-independent. 


Linear probing Theorem 1 is used in [41] to get bounds for linear probing. Linear probing is a classic 
implementation of hash tables. It uses a hash function h to map a set of n keys into an array of size m. When 
inserting x, if the desired location h(x) € [m] is already occupied, the algorithm scans h(x) + 1. h(x) + 
2,..., m — 1,0,1,... until an empty location is found, and places x there. The query algorithm starts at 
h{x) and scans until it either finds x, or runs into an empty position, which certifies that x is not in the 
hash table. When the query search is unsuccessful, that is, when x is not stored, the query algorithm scans 
exactly the same locations as an insert of x. A general bound on the query time is hence also a bound on the 
insertion time. 

This classic data structure is one of the most popular implementations of hash tables, due to its un¬ 
matched simplicity and efficiency. The practical use of linear probing dates back at least to 1954 to an 
assembly program by Samuel, Amdahl, Boehme (c.f. [31]). On modern architectures, access to memory is 
done in cache lines (of much more than a word), so inspecting a few consecutive values typically translates 
into a single memory access. Even if the scan straddles a cache line, the behavior will still be better than 
a second random memory access on architectures with prefetching. Empirical evaluations [5, 26, 39] con¬ 
firm the practical advantage of linear probing over other known schemes, while cautioning [26, 51] that it 
behaves quite unreliably with weak hash functions. 

Lineal - probing was shown to take expected constant time for any operation in 1963 by Knuth [30], 
in a report which is now regarded as the birth of algorithm analysis. This analysis, however, assumed a 
truly random hash function. However, Pagh et al. [38] showed that just 5-independence suffices for this 
expected constant operation time. In [40], 5-independence was proved to be best possible, presented a 
concrete combination of keys and a 4-independent random hash function where searching certain keys takes 
Q(log n) expected time. 

In [41], the result from [38] is strengthened for more tilled linear probing tables, showing that if the 
table size is m = (1 + s)n, then the expected time per operation is 0(l/e 2 ), which asymptotically matches 
the bound of Knuth [30] with truly random hashing. More important for this paper, [41] proved that this 
performance bound also holds with simple tabulation hashing. 

In fact, for simple tabulation, we get quite strong concentration results for the time per operation, e.g,, 
constant variance for constant e. For contrast, with 5-independent hashing, the variance is only known to be 
O(logn) [38, 51]. 

Some experiments are done in [41] comparing simple tabulation with the fast 2-independent multiply- 
shift scheme from [16] in linear probing. For simple inputs such as consecutive integers, the performance 
was extremely unreliable with the 2-independent hashing, but with simple tabulation, everything worked 
perfectly as expected from the theoretical guarantees. 
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Cuckoo hashing In cuckoo hashing [39] we use two tables of size m > (1 + e)n and independent hash 
functions ho and hi mapping the keys to these two tables. Cuckoo hashing succeeds if we can place every 
key in one of its two hash locations without any collision. We can think of this as a bipartite graph with 
a set for each table and an edge (ho(x), In (x)) for each key x. Cuckoo hashing fails exactly if this graph 
has a component with more edges than vertices. With truly random hashing, this bad event happens with 
probability © (-M. Pat rase u and Thorup [41] study the random graphs induced by simple tabulation, and 
obtain a rather unintuitive result: the worst failure probability is inversely proportional to the cube root of 
the set size. 

Theorem 3 ([41]) Any set of n keys can be placed in two tables of size m = (1 + s) by cuckoo hashing 
and simple tabulation with probability 1 — 0(n -1 / 3 ). There exist sets on which the failure probability is 
n(n -1 / 3 ). 

Thus, cuckoo hashing with simple tabulation is an excellent construction for a static dictionary. The 
dictionary can be built (in linear - time) after hying 0(1) independent hash functions w.h.p., and later every 
query runs in constant worst-case time with two probes. We note that even though cuckoo hashing requires 
two independent hash functions, these essentially come for the cost of one in simple tabulation: the pair of 
hash codes can be stored consecutively, in the same cache line, making the running time comparable with 
evaluating just one hash function. 

In the dynamic case. Theorem 3 implies that we expect fl(n 4//3 ) updates between failures requiring a 
complete rehash with new hash functions. 

Min wise independence In [41] it is shown that simple tabulation is e-minwise independent, for a van¬ 
ishingly small e (inversely polynomial in the set size n). This takes ©(logn)-independence by general 
techniques [27, 40]. More precisely, we have 

Theorem 4 ([41]) Consider a set S C XT of n = |S| keys and q G S. If h : E c -» [m], m > n 1+1 / c is 
implemented by simple tabulation: 

1 i £ ( lg: ^ n \ 

Pr[/i(q) = min/r(S)] = —-, where e = O ( —p/T ) • (8) 

The classic application of e-minwise hashing of Broder [8, 7] is the estimation of Jaccard set similarity 
\AC\ B\/\AC B\. Ignoring the probability of collisions in the minimum hash value, we get 

Pr[min/r(yl) = min/i(.B)] 

= ^2 Pr[/i(x) = min h(A U B)] 

xeAnB 

= ITHl./j ± p( 1 T 

MUB| V \\AijBflc))' 

For better bounds on the probabilities, we would make multiple experiments with independent hash func¬ 
tions, yet this cannot eliminate the bias e. 
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The power of two choices The power of two choices is a standard scheme for placing balls into bins 
where each ball hashes to two bins, and is placed in the lightest loaded one (see [33] for a survey). When 
placing n balls into n bins, using the two-choice paradigm with truly random hash functions, the maximum 
load of any bin is lg lg n + 0(1) w.h.p. [4]. Dahlgaard et al. [14] have proved that simple tabulation gives a 
maximum load which is lg lg n + 0(1) in expectation and 0(log log n) w.h.p.. This is the simplest constant 
time hashing scheme known to offer such strong two-choice load balancing. 

Weakness with small numbers As described above, simple tabulation has much more power than sug¬ 
gested by its 3-independence. However, there are also some weaknesses. For example, in the Chernoff-style 
bounds (5) and ( 6 ) from Theorem 1, we have an additive error probability of 1/m 7 when hashing into m 
bins. Here 7 is an arbitrarily large constant, so this is fine when m is large. However, this is not good if m 
is small, e.g., m = 2 as when we toss a coin. A somewhat related problem is that our bias e in e-minwise 
independence in ( 8 ) is 0(l/n l,/c ) where n is the size of the set considered. This is fine if the set is large, but 
not if the set is small. Both of these problems and more will be addressed by twisted tabulation described 
below. 


3 Twisted tabulation 

We will now consider twisted tabulation proposed by Patra§cu and Thorup in [42], It adds a quick twist 
to simple tabulation, leading to more general distributional properties, including Chernoff bounds that also 
work for few bins and better minwise hashing that also works well for small sets. For i = 1,..., c — 1, 
we expand the entries of hi with a random character called the “ twister ”, More precisely, for i > 0, we 
now have random tables h* : E —>• E x 72. This adds lg E bits to each entry of the table (in practice, we 
want entries to have bit lengths like 32 or 64, so depending on 72. the twister may, or may not, increase the 
actual entry size). The table ho : E —>• 72 is kept unchanged. The hash function is now computed in two 
statements: 


C— 1 

(f,/i>o) = ( 9 ) 

i —1 

h(x) = li > 0 © ho[xQ © f]. 

Figure 1 contains the C-code for simple and twisted tabulation. 

The highlight of twisted tabulation is its minimalistic nature, adding very little to the cost of simple 
tabulation (but, as we shall see, with significantly stronger guarantees). Twisted tabulation uses exactly c 
character lookups into tables with E entries, just like simple tabulation, though with larger entries. Essen¬ 
tially twisted tabulation only differs from simple tabulation by two AC 0 operations, so we would expect it 
to be almost as fast as simple tabulation (whose practicality has long been established). This was confirmed 
experimentally in [42], where twisted tabulation was less than 30% slower than simple tabulation, and still 
nearly three times faster than a second degree polynomial. 

When we discuss properties of twisted tabulation, we view keys x = (xo, ■ ■ ■ ,27-1) as composed of 
a head, head(x) = xo and a tail, tail(x) = (xi, ... ,x c _i). We refer to the following implementation of 
twisted tabulation which is less efficient but mathematically equivalent to (9): 

1. Pick a simple tabulation hash function h T : E c_1 —> E from c — 1 characters to 1 character. This 
corresponds to the twister components of h,*,... , h*_ 1 . Applying h T to the tail of a key x, we get 



#include <stdint.h> //defines uintX_t as unsigned X-bit integer. 

uint32_t SimpleTab32(uint32_t x, uint32_t[4][256] H) { 
uint32_t i; 
uint32_t h=0; 
uint8_t c; 
for (i=0;i<4;i++) { 

c=x; 

h"=H[i][c]; 
x = x >> 8; 

} 

return h; 


uint32_t TwistedTab32(uint32_t x, uint64_t[4][256] H) { 
uint32_t i; 
uint64_t h=0; 
uint8_t c; 
for (i=0;i<3;i++) { 

c=x; 

h~=H[i][c]; 
x = x >> 8; 

} 

c=x'h; // extra xor compared with simple 

h~=H [i] [c]; 

h>>=32; // extra shift compared with simple 

return ((uint32_t) h) ; 


Figure 1: C-code for simple and twisted tabulation for 32-bit keys assuming a pointer H to some randomly 
tilled storage (4KB for simple and 8KB for twisted). 
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the combined twister t = h T ( tail(x)), the “twisted head” xq © t, and the “twisted key” hT{x) = 
(x 0 ©t,x 

2. Pick a simple tabulation hash function h s : L' : -q 72 (where 72 was the desired output range). This 
corresponds to ho and for i > 0 . the non-twister component hi of h* (remember that tails arc not 
touched by twisting). The twisted tabulation hash function is then x i-a h s (l T(x)). 

For all the results presented here, it does not matter which character we view as the head. Above it is 
the first character, but sometimes it is more efficient if it is the last least significant character. 

As noted in [42], the twisting by hT can be seen as a single-round Feistel permutation where the hash 
function is simple tabulation. Because twisting is a permutation, twisted tabulation inherit from simple 
tabulation any (probabilistic) property that holds regardless of concrete key values, e.g., the above Cuckoo 
hashing and the power of two choices. Like simple tabulation, twisted tabulation is only 3-independent, but 
it does have some more general distributional guarantees, which we explain in detail below. 

Chernoff Bounds Chernoff bounds play a prominent role in the design of randomized algorithms [36, 
§4]. The Chernoff-style bounds from Theorem 1 where limited in that they only really worked for throwing 
balls into a large number m of bins. In [42], Patra§cu and Thorup prove the following far more general 
Chernoff-style bounds for twisted tabulation. 

Theorem 5 ([42]) Choose a random c-character twisted tabulation hash function h = h s o hT : [it] —>• [it], 
[it] = X c . For each key x G [it] in the universe, we have an arbitrary “value function” v x : [it] —>• [0,1] 
assigning a value V x = v x (h(x)) € [0,1] to x for each hash value. Define V = Yh x &\u]^x an ^ d = 
dx where p x = E[v x (h(x))]. Let 7 , c and e > 0 be constants. Then for any ji < E 1-£ and 5 > 0, 
we have: 

/ P S \ 

Fr[F>d +Ms( Irww ) + 1/” 1 
/ —6 \ 

Pr[I/ < (1 - S)A < ( (1 _ <y)(1 _ f) j + 1/u-r 

Moreover, for any /i > \/S (including // > Y, 1 ~ £ ), with probability 1 — it -7 , 

V = p±d(Td)- (12) 

If we have a given distinguished query key q G [it], the above bounds hold even if we condition everything 
on h{q) = a for any given a G [it] 1 . 

The statement of Theorem 5 may seem a bit cryptic, so before proceeding, we will show that it improves 
the simple tabulation bounds from Theorem 1. Those bounds considered a set S of n balls or keys mapped 
into m bins, and had an error bound of m -7 . The error is here improved to it -7 . 

We now do the translation. Discounting all irrelevant keys x G [it] \ S, we zero their value functions, 
setting v x f) = 0. Also, we define the bin hash function from Theorem 1 as h!{x) = h(x) mod m, noting 
that m f u since both arc powers of two. Theorem 1 studies the number of balls landing a bin b which 
may be a function of the bin h! ((f) of a query ball q G [u] \ S. Thanks to the last statement of Theorem 

'This last statement conditioning on h(q) = a was not proved in [42], but it is an easy extension provided in Appendix A. 


( 10 ) 

( 11 ) 
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5, we can condition on any value a of h(q), which determines h'(q ) and hence b. Now, for x € S, define 
V x = v x (h{x)) = 1 if fi'(x) = 6; 0 otherwise. Now V = is the variable X from Theorem 1, and 

(5) and ( 6 ) follow from (10) and (11), but with the improved error vT 1 . This is important when m is small, 
e.g., if m = 2 corresponding to unbiased coin tosses. 

A different illustration of the versatility of Theorem 5 is if we want each key x €= [u] to be sampled 
with some individual sampling probability p x . In this case we have no distinguished query key, and we can 
just define v x (y) = 1 if y < p x ■ to; otherwise v x (y) = 0. Since fi(x) is uniform in [to], we have that x 
is sampled with V x = v x (h(x)) = 1 with probability p x = \p x ni\/m. The number V = ^ xe r u i V x of 
samples is now concentrated according to ( 10 ) and ( 11 ). 

Minwise independence Concerning minwise hashing, Dahlgaard and Thorup [15] have proved that twisted 
tabulation yields the following strengthening of Theorem 4 for simple tabulation. 

Theorem 6 ([15]) Consider a set S C £ c of n = |<Sj keys and q € S. If h : £ c —X [to], to > nu l / c is 
implemented by twisted tabulation then: 

Pr \h(q) = min fils')] = - — £ , where e = O ( -^ 77 -^ • (13) 

n \ u l ' c J 

The important difference is that the bias O from Theorem 4 is replaced by O ^^ 7777 ^ which is small 

regardless of the set size. Such an absolutely small bias generally requires fl(log ^-independence [40]. 

Short range amortization for chaining and linear probing We now switch to a quite different illustra¬ 
tion of the power of twisted tabulation hashing from [42]. Consider linear probing in a half-full hash table 
with n keys. Out of y/n operations, we expect some to take Of log n) time. Nevertheless we show that any 
window of logn operations on distinct keys is executed in O (1 g n) time with high probability. This also 
holds for the simpler case of chaining. 

The general point is that for any set of stored keys and any set of window keys, the operation times 
within the window arc sufficiently independent that the average concentrates nicely around the expected 
constant operation time. Such concentration of the average should not be taken for granted with real hash 
functions. In [40] arc input examples for linear probing with multiplication-shift hashing [16] such that if 
one operation is slow, then most operations arc slow. In [42] is presented a parallel universe construction 
causing similar problems for simple tabulation. As stated above, twisted tabulation does, however, provide 
sufficient independence, and we expect this to prove useful in other applications. 

Pseudo-random numbers generators Like any hash function, twisted tabulation naturally implies a 
pseudo-random numbers generators (PRG) with the pseudo-random sequence fi.(0), fi(l),.... For maximal 
efficiency, we use the last and least significant character as the head. Thus, a key x = (x c _ 1 ...., xq) € X c 
has head(x) = xo and tail(.x) = x>o = ( x c -i, ■ ■ ■, x\). For twisted tabulation, we use a simple tabulation 
function h* : S c_1 ->Sx [ 2 r ] and a character function fio : £ —X [ 2 r ], and then hix) is computed, setting 
{t , fi> 0 ) = fi*(x>o) and h(x) = h,o[xo 0 1] © fi>o- We now think of x as the pair (x>o, a©) £ L c_ 1 x S. As 
we increase the index x = 0 , 1 ,,.., E — 1 , £, £ + 1 ,... = ( 0 , 0 ), ( 0 , 1 ),... , ( 0 , S — 1 ), ( 1 , 0 ), ( 1 , 1 ),..., 
the tail .:/;>o only increases when xo wraps around to 0—once in every £ calls. We therefore store (t, fi>o) = 
fi*(x>o) in a register, recomputing it only when x>o increases. Otherwise we compute fi(x) = fi-o [xo © t] © 
fi >0 using just one character lookup and two ©-operations. We found this to be exceedingly fast: as fast as 
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a single multiplication and 4 times faster than the standard random number generator random () from the 
GNU C library which has almost no probabilistic guarantees. Besides being faster, the twisted PRG offers 
the powerful distributional guarantees discussed above. 

As an alternative implementation, we note that h* is itself applied to consecutive numbers x>o = 
0,1, 2,..., so h* can also be implemented as a PRG. The /U-PRG is only applied once for every E numbers 
generated by h, so the /i*-PRG can be much slower without affecting the overall performance. Instead of 
implementing h* by simple tabulation, we could implement it with any logarithmically independent PRG, 
thus not storing any tables for h*, but instead generating each new value /t*(x>o) on the fly as x’>q increases. 
We can view this as a general conversion of a comparatively slow but powerful PRG into an extremely fast 
one preserving the many probabilistic properties of twisted tabulation. The recent PRG of Christiani and 
Pagh [11] would be a good choice for the /P-PRG if we do not want to implement it with simple tabulation. 

Randomized Algorithms and Data Structures When using the twisted PRG in randomized algorithms 
[36], we get the obvious advantage of the Chernoff-style bounds from Theorem 5 which is one of the basic 
techniques needed [36, §4]. The e-minwise hashing from Theorem 6 with e = ()(1 /v^ c ) is important 
in contexts where we want to assign randomized priorities to items. A direct example is treaps [44]. The 
classic analysis [36, §8.2] of the expected operation time is based on each key in an interval having the same 
chance of getting the lowest priority. Assigning the priorities with an e-minwise hash function, expected 
cost is only increased by a factor (1 + e) compared with the unrealistic case of true randomness. In static 
settings, we can also use this to generate a random permutation, sorting items according to priorities. This 
sorting itself takes linear time since we essentially only need to look at the log n most significant bits to 
sort n priorities. Using this order to pick items, we get that classic algorithms like Quicksort [36, §1] and 
Binary Planar Partitions [36, §1.3] perform within an expected factor (1 + e) of what they would with true 
randomness. With e = ()(1 /u l > c ) as with our twisted PRG, this is very close the expected performance 
with true randomness. 

4 Double tabulation and high independence 

Thorup [50] has shown that simple tabulation can also be used to get high independence if we apply it 
twice. More precisely, we consider having two independent simple tabulation functions ho : E c —* E cZ and 
h\ : T, d —» [2 r ], and then the claim is that h \ o ho is likely to be highly independent. The main point from 
[50] is that the first simple tabulation ho is likely to have an expander-type property. 

More precisely, given a function /:[«]—>• E d , a key set X C [u] has a unique output character if there 
is a key x € X and a j £ [d] and such that for all y e X \ {x}, h(y)j f h(x)j, that is, the jth output 
character is unique to some x in X. We say that / is k-unique if each non-empty key set Y C [u] of size at 
most k has a unique output character. Siegel [46] noted that if / is fc-unique and h\ : Y' 1 —>• [2 r ] is a random 
simple tabulation function, then h\ o f : [u] -X [2 r ] is fc-independent. The main technical result from [50] is 

Theorem 7 ([50]) Consider a random simple tabulation function ho : E c —>• Y d . Assume c = E^ 1 ) and 
(c + d) c = Let k = E 1 ^ 5 ^. With probability 1 — o(E 2 ~ rf /( 2c )), the function ho is k-unique. More 

concretely for 32-bit keys with 16-bit characters, ho is 100-unique with probability 1 — 1.5 x 10~ 42 . 

Assuming that ho is k-unique, ifh\ : E rf —>• [2 r ] is a random simple tabulation function, then h\ o ho is 
k-independent. 

This construction for highly independent hashing is much simpler than that of Siegel [46] mentioned in 
Section 1, and for d = 0(c), the evaluation takes 0(c) time as opposed to the 0(c) c time used by Siegel. 
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Complementing the above result, Dahlgaard et al. [13] have proved that double tabulation is likely to be 
truly random for any specific set S with less than (1 — f2(l))X keys: 

Theorem 8 ([13]) Given a set S C [«] of size (1 — 12(1))X, consider two random simple tabulation function 
ho : X c —> X d and h\ : T, d —>• [2 r ], With probability 1 — 0 (£ 1 - L rf / 2 J) > every non-empty subset X C S gets 
a unique output character with ho, and then the double tabulation function h\ o ho is fully random over S. 

It is interesting to compare Theorem 8 and Theorem 7. Theorem 8 holds for one large set while Theorem 7 
works for all small sets. Also, Theorem 8 with d = 4 “derived” characters gets essentially the same error 
probability as Theorem 7 with d = 6c derived characters. 

Siegel [46] has proved that with space X, we cannot in constant time hope to get independence higher 
than E 1 -^ 1 ), which is much less than the size of the given set in Theorem 8 . 

Theorem 8 provides an extremely simple O(n) space implementation of a constant time hash function 
that is likely uniform on any given set S. This should be compared with the previous linear space uni¬ 
form hashing of Pagh and Pagh [37, §3]. The most efficient implementation of [37, §3] uses the highly 
independent double tabulation from Theorem 7 a subroutine. However, as discussed earlier, double tabu¬ 
lation uses much more derived characters for high independence than for uniformity on a given set, so for 
lineal - space uniform hashing on a given set, it is much faster and simpler to use the double tabulation of 
Theorem 8 directly. We note that [37, §4] presents a general trick to reduce the space from linear, that is, 
0(ro(lgn + lg \'R\)) bits, downto (1 + e)n lg \'1Z\ + 0{n ) bits, preserving the constant evaluation time. This 
reduction can also be applied to Theorem 8 so that we also get a simpler overall construction for a succinct 
dictionary using (1 + e)n lg 77 + ()(n) bits of space and constant evaluation time. 

Very recently, Christiani et al. [12] have shown that we using a more elaborate recursive tabulation 
scheme can get quite to Siegel’s lower-bound. 

Theorem 9 ([12, Corollary 3]) For word-size w, and parameters k and c = 0(w/ (log k )), with probability 
1 — u l / c , we can construct a k-independent hash function h : [ 2 ™] —>• [ 2 “] in 0{cku l ^ c ) time and space that 
is evaluated in 0(c log c) time. 

In Theorem 8 , we used the same space to get independence and evaluation time 0(c), and the 

construction of Theorem 8 is simpler. We should thus use Theorem 9 if we need its very high independence, 
but if, say, logarithmic independence suffices, then Theorem 8 is the better choice. 

A major open problem is get the space and independence of Theorem 9 but with 0(c) evaluation time, 
matching the lower bound of [46]. In its full generality, the lower bound from [46] says that we for indepen¬ 
dence k with c < k cell probes need space H(fe(n/fc) 1 / c c). 

Invertible Bloom filters with simple tabulation Theorem 8 states that if a random simple tabulation 
function ho : X c —> X d is applied to a given set S of size (1—12(1))X, then with probability 1—©(X 1- ^/ 2 !) ? 
every non-empty subset ICS gets a unique output character. This is not only relevant for fully-random 
hashing. This property is also sufficient for the hash function in Goodrich and Mitzenmacher’s Invertible 
Bloom Filters [25], which have found numerous applications in streaming and data bases [21, 22, 35]. So 
far Invertible Bloom Filters have been implemented with fully random hashing, but Theorem 8 shows that 
simple tabulation suffices for the underlying hash function. 

4.1 /{-partitions via mixed tabulation 

The general goal of Dahlgaard et al. [13] is a hash function for / -partitioning a set into bins so that we get 
good concentration bounds when combining statistics from each bin. 
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To understand this point, suppose we have a fully random hash function applied to a set X of red and 
blue balls. We want to estimate the fraction / of red balls. The idea of Minwise hashing is to sample the ball 
with the smallest hash value. This sample is uniformly random and is red with probability /. If we repeat the 
experiment k times with k independent hash functions, we get a multiset S of k samples with replacement 
from X and the fraction red balls in S concentrates around / as we increase the number of samples. 

Consider the alternative experiment using a single hash function, where we use some bits of the hash 
value to partition X into k bins, and then use the remaining bits as a local hash value. We pick the ball with 
the smallest hash value in each bin. This is a sample S from X without replacement, and again, the fraction 
of red balls is concentrated around /. 

The big difference between the two schemes is that the second one runs f 1(h) times faster. In the first 
experiment, each ball participated in k independent experiments, but in the second one with A-partitions, 
each ball picks its bin, and then only participates in the local experiment for that bin. Thus with the k- 
partition, essentially, we get k experiments for the price of one. 

This generic idea has been used for different types of statistics. Flajolet and Martin [23] introduced it 
to count the number of distinct items in a multiset, Charikar et al. [10] used it in their count sketches for 
fast estimation of the second moment of a data stream, and recently, Li et al. [32, 45] used it for Minwise 
estimation of the Jaccard Similarity of two sets. 

The issue is that no realistic hashing scheme was known to make a good enough A-partition for the 
above kind of statistics to make sense. The point is that the contents of different bins may be too correlated, 
and then we get no better concentration with a larger k. In the independence paradigm of Carter and Weg- 
man [52], it would seem that we need independence at least k to get sufficiently independent statistics from 
the different bins. 

An efficient solution is based on a variant of double tabulation described below. 

Mixed tabulation For Theorem 8 we may use d = 4 even if c is larger, but then ho will introduce many 
collisions. To avoid this problem we mix the schemes in mixed tabulation. Mathematically, we use two 
simple tabulation hash functions h\ : [u] —>• Y.' 1 and h 2 : Y c+d —>• [2 r ], and define the hash function 
h(x ) i->- h 2 (x o h \ (x)), where o denotes concatenation of characters. We call x o h\(x) the derived key , 
consisting of c original characters and d derived characters. Since the derived keys includes the original 
keys, there arc no duplicate keys. 

We note that mixed tabulation only requires c + d lookups if we instead store simple tabulation functions 
h\ 2 : £ c —>• Yi d x [r] and h' 2 : T, d -> [r], computing h(x) by = h\ t 2 ( 2 ); h(x) = v\ © This 

efficient implementation is si mi lar to that of twisted tabulation, and is equivalent to the previous definition. 
As long as we have at least one derived character, mixed tabulation has all the distribution properties of 
twisted tabulation, particularly, the Chernoff-style concentration bound from Theorem 5. At the same time, 
we get the full randomness from Theorem 8 for any given set S of size (1 — fl(l))S. Based on these 
properties and more, it is proved in [13] that mixed tabulation, w.h.p., gets essentially the same concentration 
bounds as full randomness for all of the above mentioned statistics based on /^-partitions. 
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A Conditioning on the hash value of a query key 

We arc here going to argue that the Chernoff-style bounds from Theorem 5 hold when we for a given query 
key q and hash value a condition everything on h(q) = a. This will be a simple extension of the proof of 
Theorem 5 in [42, §3] without the condition. Below we first describe the essential points from the proof in 
[42, §3], referring the reader to [42, §3] for more details. Afterwards we describe the extension to handle 
the condition. 

No conditioning. In [42, §3], the twisted tabulation hash function is picked in 3 steps. 

1. We randomly fix /> ^twisting the keys. This defines the twisted groups G a consisting of keys with the 
same twisted head a. 

2. We randomly fix the hf, i > 0, hashing the tails of the keys. 

3. We randomly fix Hq , hashing the twisted heads. This finalizes the twisted tabulation hashing h = 

h s o iff. 

Let V a = ZxeG a v x(h{x)) be the total value of keys in twisted group G a . The main technical point in [42, 
§3] is to prove that, w.h.p., after step 2, we will have V a < d = 0(1) no matter how we fix /vff [a], and 
this holds simultaneously for all twisted groups. While /iq [a] has not been fixed, h(x) is uniform in [u] for 
every x € G a , so the expected value p a = E [V a \ = dx over G a is unchanged before step 3. It 

also follows from [42, §3] that //„ < d. Our final value is U — U> which still has the correct mean 

E[V] = //. Moreover, V is Chernoff-concentrated since it is summing d-bounded variables V a as we fix the 
hg [a] in step 3. We arc here ignoring some technicalities explained in [41, 42], e.g., how to formally handle 
if some variable V a is not (/-bounded. 

Conditioning on h(q) = a. We now consider the effects of the condition h(q) = a. First we note 
that since twisted tabulation is 2-independent, the condition h(q) = a does not affect the expected value 
p x = E \v x (h(x))] of any key x f q. 

Now, for the above step-wise fixing of the twisted tabulation hash function, we note that the only effect 
of the condition h(q) = a is in step 3. If ao is the twisted head of the query key, then instead of picking 
hf } [o(,] at random, we have to fix it as [«o] = a © ©^=i /tf//,]■ Since the first steps were not affected. 
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w.h.p, for all a € £, we still have that V a < d = 0(1) no matter how we fix [a], and this includes the 
twisted head op. 

We arc now again considering V = where each V a is (/-hounded. For all a / op, we have 

E[V 0 ] = fi a . However, V QQ is fixed to some value in [0, d] when we after steps 1-2 arc forced to set 
hf } [a 0 ] = a © 0' = I hf[qi]. The error \V ao — // Y)0 from this fixing is less than d, and this has no effect on 
our error probability bounds except in cases where Markov’s bound takes over. However, Markov’s bound 
holds directly for twisted tabulation without any of the above analysis. Thus we conclude that Theorem 5 
holds also when we condition on h(q) = a. 
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